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Abstract 

We consider a general stochastic input-output dynamical system with output 
evolving in time as the solution to a functional coefficients, Ito's stochastic differ- 
ential equation, excited by an input process. This general class of stochastic systems 
encompasses not only the classical communication channel models, but also a wide 
variety of engineering systems appearing through a whole range of applications. For 
this general setting we find analogous of known relationships linking input-output mu- 
tual information and minimum mean causal and non-causal square errors, previously 
established in the context of additive Gaussian noise communication channels. Rela- 
tionships are not only established in terms of time- averaged quantities, but also their 
time-instantaneous, dynamical counterparts are presented. The problem of appropri- 
ately introducing in this general framework a signal-to-noise ratio notion expressed 
through a signal-to- noise ratio parameter is also taken into account, identifying con- 
ditions for a proper and meaningful interpretation. 
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1 Introduction 



Consider the widely used communication system model known as the standard additive 
white Gaussian noise channel, described by 

y;^V^ fx,ds + Wt, te[o,T], (1) 

Jo 

where r e [0, oo) is the signal-to-noise ratio parameter, T e (0, oo) is a fixed time-horizon, 
X — {Xt)teio,T\ is the transmitted random signal or channel input, W — {Wt)teio,T\ is an 
independent standard Brownian motion or Wiener process representing the noisy trans- 
mission environment, and y = (l^'')tg[o,T] is the received random signal or channel output, 
corresponding to the respective value of the signal-to-noise ratio parameter r. 

Of central importance from an information theoretical point of view is the input-output 
mutual information, i.e., the mutual information between the processes X and Y^, denoted 
by I(r). (Precise mathematical definitions are deferred to the next section.) On the 
other hand, of central importance from an estimation theoretical point of view are the 
causal and non-causal minimum mean square errors, in estimating or smoothing X at time 
t G [0,T], denoted by cmmsex(^,T) and ncmmsex{t,r), respectively. Input-output mutual 
information encloses a measure of how much coded information can be reliably transmitted 
through the channel for the given input source, whereas the causal and non-causal minimum 
mean square errors indicate the level of accuracy that can be reached in the estimation of 
the transmitted message at the receiver, based on the causal or noncausal observation of 
an output sample path, respectively. 

Interesting results on the relationship between filter maps and likelihood ratios in the 
context of the additive white Gaussian noise channel have been available in the literature 
for a while (see for example [1] and references therein). An interesting specific result linking 
information theory and estimation theory in this same Gaussian channel context, concretely, 
input-output mutual information and causal minimum mean square error, is Duncan's 
theorem [2] stating, under appropriate finite average power conditions, the relationship 

r /"^ 

/(r) — - cmmsex (s, r)ds, r e [0, oo), (2) 
2 Jo 

i.e., after dividing both sides by T, stating the proportionahty (through the factor |) of 
mutual information rate per unit time and time average causal minimum mean square er- 
ror. It was recently shown by Guo et al. [3] that the previous relationship is not the only 
linking property between information theory and estimation theory in this Gaussian chan- 
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nel setting, but also that there exists an important result involving input-output mutual 
information and non- causal minimum mean square error, namely 



A. 
dr 



1 

I{r) = - ncmmse x{s,r)ds, r G [0,oo). (3) 
2 Jo 



As pointed out by Guo et al. [3], an interesting relationship between causal and non-causal 
minimum mean square errors can then be directly deduced from ([2]) and ([3]), giving 

I cmmsex{s,r)ds = - f f ncmmse x{s,u)dsdu, rG(0,oo), (4) 
Jo ^ Jo Jo 

i.e., after dividing as before both sides by T, stating the equality between time average 
causal minimum mean square error and the in turn averaged over the signal-to-noise ratio, 
time average non-causal minimum mean square error. Equations ([2]) to (jlj) can for example 
be used to study asymptotics of input-output mutual information and minimum mean 
square errors, and to find new representations of information measures [3]. 

An increasing necessity of considering general stochastic models has arisen during the 
last decades in the stochastic systems modelling community, not just from a communication 
systems standpoint, but from a wide variety of applications demanding the consideration of 
general stochastic input-output dynamical systems described by Ito's stochastic differential 
equations of the form 

Y[ = ^ f F{s,X,Y')ds+ [ G{s,Y')dW,, t e [0,T], (5) 
Jo Jo 

with X the input stochastic process to the system, r a non-negative real parameter (to be 
interpreted further in subsequent sections), Y^ the corresponding system output stochas- 
tic processll, and F and G given (time-varying) non-anticipative functionals, i.e., with 
F{t, X, Y^') depending on the random paths of X and Y^ only up to time t, and similarly 
for G{t,Y^). Note since W is an infinite variation process, the integral 

t 

G{s,Y')dWs 

is an Ito's stochastic integral and not an standard pathwise Lebesgue-Stieltjes integral. 
For the input process X, the corresponding system output Y^ evolves in time then as the 
solution to the stochastic differential equation ([5]). (Once again, we defer mathematical 
preciseness to subsequent sections.) From a modelling point of view, the flexibility offered 



^To ease notation we simply write y, instead of for example 1"'"'^, the input process X being clear 
from the context. 
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by the general model ([5]) captures a bast collection of system output stochastic behaviors, 
as for example the class of strong Markov processes [4]. As mentioned, general stochastic 
input-output dynamical systems as the one portrayed by ([5]) appear in a wide variety of 
stochastic modelling applications. They are usually obtained by a weak-limit approxima- 
tion procedure, where a sequence of properly scaled and normalized subjacent stochastic 
models is considered and shown to converge, in a weak or in distribution stochastic process 
convergence sense [5-8], to the solution of a corresponding stochastic differential equa- 
tion. Just to name a few, some examples are applications to adaptive antennas, channel 
equalizers, adaptive quantizers, hard limiters, and synchronization systems such as stan- 
dard phase-locked loops and phase-locked loops with limiters [8]. They have also become 
extremely useful in heavy-traffic approximations of stochastic networks of queues in opera- 
tions research and communications [6,9-15], where they are usually brought into the picture 
along with the Skorokhod (or reflection) map constraining a given process to stay inside 
a certain domain or spatial region [6,16], and in mathematical economics (option pricing 
and the Black-Scholes formula, arbitrage theory, consumption and investment problems, 
insurance and risk theory, etc.) and stochastic control theory [17-20]. 

The so obtained diffusioij§ models offer two main modelling advantages. On one hand, 
they usually wash off in the limit non fundamental model details, accounting for mathe- 
matical tractability and leading to a diffusion model that captures the main aspects and 
trade offs involved. On the other, they have the enormous advantage of taking the mod- 
elling setting to the stochastic analysis framework, where the whole machinery of stochastic 
calculus is available. 

From a purely communication systems modelling viewpoint, it is worth emphasizing that 
a general stochastic input-output dynamical system such as ([5]) encompasses all standard 
communication Gaussian channel models as particular cases, such as the white Gaussian 
noise channel (with/without feedback) or its extension to the colored Gaussian noise case. 
These particular instances will be mathematically described in subsequent sections. It 
is also worth mentioning that though more sophisticated mathematical frameworks have 
been considered in the literature, as for example an infinite dimensional Gaussian setting 
[21] with the associated Malliavin's stochastic analysis tools [22,23], the essentially white 
Gaussian nature of the noise has remained untouched by most. In this regard, the main 
tools considered to establish relationships such as ([3]) and (jlj) usually depend critically on a 
Levy structur^ for the noisy ter nfl and, specifically, on its independent increment property 



^An strong Markov process with continuous sample paths is generahy termed a diffusion. 
•^RecaU a process with stationary independent increments is termed a Levy process. 
^Following the communication systems jargon, we refer to the integral G{s,Y^)dWs as the noise 
term. Further interpretations on this line are discussed in the next section. 
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such as in the purely Brownian motion noisy term case wherqj G = a G M (a constant) in 
([2D. The flexibihty of an Ito's stochastic integral with general functional G in ([S]) allows 
for a much generality of stochastic behaviors, including non-Levy ones. 

The main objective of this paper is to establish links between information theory and 
estimation theory in the general setting of a stochastic input-output dynamical system 
described by ([5]). Specifically, it is shown that an analogous relationship to ([2]) can be 
written in this setting, so extending classical Duncan's theorem for standard additive white 
Gaussian noise channels with and without feedback [2,24] to this generalized model. Proofs 
are in the framework of absolutely continuity properties of stochastic process measures, 
subjacent to the Girsanov's theorem [4,25]. Relationships ([3]) and (jl]) are also studied 
in this generalized setting. As mentioned, they were shown to hold in the context of the 
additive white Gaussian noise channel in the work of Guo et al. [3]. However, as also pointed 
out in that work, they fail to hold when feedback is allowed in that purely Gaussian noise 
framework. We show that failure obeys to the fact that a proper notion of a signal-to-noise 
ratio expressed through a parameter such as r in ([1]) cannot be properly introduced in 
that case, and, by adequately identifying conditions for a signal-to-noise ratio parameter 
to have a meaningful interpretation, we find analogous relationships to ([3]) and (jlj) holding 
for a subclass of models contained in the general setting of ([5]). The analysis includes the 
identification and proper definition of three important classes of related systems, namely 
what we will came to call quasi-signal-to-noise, signal-to-noise and strong-signal-to-noise 
systems. 

Another particular aspect adding scope of applicability to the results exposed in the 
present paper, in addition to the system model generality considered here, is related to 
the fact that not only relationships involving time-averaged quantities such as in ([2]) and 
(111) above are extended to this general setting, but also time-instantaneous counterparts 
are provided. This fact brings dynamical relationships into the picture, allowing to write 
general integro-partial-differential equations characterizing the different information and 
estimation theoretic quantities involved. Dynamical relationships are usually absent in 
the information theory context, being in general difficult to find. The results provided 
extend then not only the traditional Gaussian system framework, but also the customary 
time-independent, static relationships setting where information and estimation theoretic 
quantities are studied for stationary (usually Gaussian) system input processes [26-28], or 
for non-stationary system inputs but in terms of time-averaged quantities [2,24]. 

Finally, we mention that for sake of simplicity in the exposition of the results we will 
^The process {J^ G{s, Y^)dWs)te[o,T] is not a Levy process unless G is a fixed constant. 
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consider throughout the paper one-dimensional systems and processes. However, all the 
results presented in the paper have indeed multi-dimensional counterparts. These and 
further possible extensions, with the corresponding related generalized results, will not be 
difficult to carry out by the reader in light of the computations developed in the paper, and 
therefore we will only mention the main ideas involved by the end of the paper without 
giving corresponding proofs. 

The organization of the paper is as follows. In Section [2] we introduce the mathematically 
rigorous system model setup, including the model definition, the main general assumptions, 
and the different information and estimation theoretic quantities involved, such as input- 
output mutual information and causal and non-causal minimum mean square errors, as well 
as important concepts from the general theory of stochastic process such as the absolutely 
continuity of stochastic process measures. In Section [3] we establish the relationship linking 
input-output mutual information and causal minimum mean square error for the general 
dynamical input-output stochastic system considered in the paper, generalizing the known 
result for the standard additive white Gaussian noise channel with/without feedback. In 
Section H] we identified conditions under which a proper notion of a signal-to-noise ratio 
parameter can be introduced in our general system setting. We distinguish three major 
subclasses of systems and give appropriate characterizations. In Section [5] we establish the 
corresponding generalization of the relationship linking input-output mutual information 
and non-causal minimum mean square error for an appropriate subclass of system models. 
In Section [6] we provide the corresponding time-instantaneous counterparts of the previous 
results. In Section [7] we comment on further model extensions and related results. Finally, 
in Section [8] we briefly comment on the scope of the results exposed. 



2 Preliminary Elements 



This section provides the precise mathematical framework upon which the present work is 
elaborated. In addition to introduce a thoroughly mathematical definition of the dynamical 
system model to be considered throughout, it also introduces the main concepts from 
information theory and statistical signal processing appearing in subsequent sections, such 
as the notion of mutual information between stochastic processes, the accompanying notion 
of absolutely continuity of measures induced by stochastic processes, and minimum-mean 
square errors in estimating and smoothing stochastic processes. 
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2.1 System Model Definition 



Let P) be a probability space, T G (0, cxd) be fixed throughout, and {J-'t)te[o,T] be a 

filtration on JF, i.e., a nondecreasing family of sub-a-algebras of JF. We assume the filtration 
(•^t)te[o,T] satisfies the usual hypotheses [4], i.e., jFg contains all the P-nuU sets of JF and 
{J-'t)t&[Q,T] is right-continuous. Also, let W = (Wt, J-'t)te[o,T] be a one-dimensional standard 
Brownian motioij^ [17], and (Ct, Bt) be the measurable space of functions in Ct, the space 
of all functions / : [0,r] ^ M continuous on [0,T], equipped with the u-algebra Bt of 
finite-dimensional cylinder sets in Ct [17], i.e|ll, 

BT = a {{Cl^}^^^ : n G Z^,{U}1, C [0, T] , T G S (M") }) 

where B{R"-) denotes the collection of Borel sets in M", n G Z+ = {1,2,...}, and 

Cly^^^ = {feCT:{f{h),...J{Q)er} 

for each n G Z+, C [0,T], and F G ;B(M"'). In a similar way we introduce, for each 

t E [0, T], the cr-algebra Bt of finite-dimensional cylinder sets in the space Ct of all functions 
/ : [0,t] ^ M continuous on [0,t], and, for At a given family of functions / : [0,T] — > M, 
the cT-algebras Bat and of finite-dimensional cylinder sets in At and At, respectively, 
with 

A = {/ko,, : / e At} 
and /ijgjj the restriction of / : [0, T] — > M to the subinterval [0, t]. 

For each r G IR+ = [0,oo) we consider a stochastic process y = (Y{ , J^t)te[o,T], with 
paths or trajectories in the measurable space {Ct,Bt), and having Ito's stochastic differ- 
ential 

dY; = V^F{t, X, Y'')dt + Y'')dWt (6) 

with Yq = where 

• the stochastic process X = {Xt,J-'t)te[o,T], with trajectories in a given measurable 
space of functions {At,Bat), is independent of W, and 

• the functionals F : [0, T] x Ay x ^ R and C : [0,T] x M are measurable 

^The notation {Zt,J^t)t£lo,T] indicates the stochastic process (^t)tg[o,T] is (-^^Otelo.Tl-adapted, i.e., Zt 
is JFj-measurable for each t S [O.T]. In case of a Brownian motion W — (Wt, Tt)te[o,T]j it also indicates 

is a martingale on that filtration, coinciding then with the also called in the literature Wiener process 
relative to {Tt)te[o,T] [25]. 

^We write, as usual, a{-) for the corresponding generated cr-algebra. 
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and non-anticipative, i.e., they are a{B{[0,T]) x Bat ^t)- and a{B{[0,T]) x Bt)- 
measurabl^, respectively, and, for each t G [0, T], F{t, ■) and G{t, ■) are cr{Bt x BAt)- 
and i^j-measurable, respectively as well. In other words, the functionals F and G are 
jointly measurable with respect to (w.r.t.) all their corresponding arguments, and 
depend at each time t G [0,T] on / G At and g E Ct only through f^^^^^^ and g\^g^p 
i.e., only on the pieces of trajectories 

{f{s),gis):se[0,t]}. 



Conditions for properly interpreting r G as a signal-to-noise ratio (SNR) parameter 
for system will be discussed in Section HI 

As discussed in Section [H we may interpret equation ([6]) as a general stochastic input- 
output dynamical system with input stochastic process X and output stochastic process 
y, for each given value of the parameter r, the output process y evolving in time t G (0, T] 
as an Ito's process [29] with differential given by ([6]). Though the scope of applicability 
of a general dynamical system model such as ([6]) exceeds by far a purely communication 
system setting, it is worth mentioning that from a classical communication channels point of 
view we shall interpret X as a random input message being printed in the "channel signal 
component" y/rFdt, received at the channel output embedded in the additive "channel 
noisy term" GdWt- The standard additive white Gaussian noise channel (AWGNC) being 
obtained from ([6]) by taking 

F{t,f,g) = fit) and G{t,g) = l, 

for each t G [0,T], / G At, and g G Ct, i.e., with the corresponding output process or 
"random received signal" evolving for t G (0,T] according to 

dY; = y/^Xtdt + dWt, (7) 

and r G M^, the channel SNR@. In this same line, note when G in ([6]) is allowed to depend 
only on t G [0, T], and not on Y^, the noisy term 

/ G{s)dw,, te [o,r], 

Jo 

^Similarly than for R", B{[0, T]) denotes the collection of Borel sets in the interval [0, T]. 
^The interpretation of r as an SNR parameter is discussed at full in Section [H 
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is a zero-mean Gaussian process with covariance function given by [30] 

fmin{ti,t2} 



E 



f ' G{s)dWs [ ' G{s)dWs 
Jo 



G\s)ds, 







ti,t2 G [0,r], provided G is square-integrable on [0,r], i.e., 

/ G'^{s)ds < oo. 
Jo 

This case is usually known in the literature as the additive colored Gaussian noise channel. 



It is technically suitable to treat in ([S]) as a system input too, as it is sometimes 
the case when the stochastic system at hand is obtained by a weak limit procedure of a 
properly scaled and normalized sequence of subjacent system models [8,13]. The principle 
of causality for dynamical systems [17] requires the output process Y'' at time t G [0,T], 
Y^ (Yq = 0), to depend only on the values 

{Xs,Ws : s e [0,t]} , 

i.e., only on the past history of X and W up to time t. (This requirement finds a precise 
mathematical expression in the adaptability condition (I) imposed below.) Therefore the 
non-anticipability nature imposed on the functional F and G. 

For a fixed deterministic trajectory x{-) G in place of X in ([2]), we have the corre- 
sponding output stochastic process, denoted as Y^'^ for each r, evolving as a solution of 
the stochastic differential equation (SDE) [4] 

y/'- = 7f /" F{s,x,Y'^''')ds+ [ G{s,Y'^''')dWs, (8) 
Jo Jo 

t G [O.T]. When for each t G [0,T] and g E Ct we have F{t,x,g) = Fx(t,g(t)) and 
G{t,g) = G{t,g{t)), for some Borel-measurable functions F^ '■ [0,T] x M ^ M and G : 
[0,T] X M — > M, y^'^ is indeed a diffusion process, i.e., an strong Markov process with 
continuous sample paths on [0,T] [31]. Though we are of course interested in the general 
case when the input to the system is a stochastic process X as in ([6]), rather than a fix 
trajectory x as in ([8]) , we refer to ([6]) as an SDE system motivated from the above discussion. 
In fact, for X and Y^ related as in ([6]), we may look at Y^ as solving the SDE with random 
drift coefficient ^ ^ 

Y[ = ^ f Bxiuj,s,Y')ds+ [ Gis,Y'')dWs, 
Jo Jo 
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t G [0, T], where the random drift functional Bx : Q x [0,T] x Ct — M is given by 

Bx{u,t,g) = F{t,X.{u),g) (9) 

for each t G [0, T] and g G Ct- Note that Ex is not only cr(jF x i3([0, T]) x -measurable, 
but also, for each t G [0, T], Bx{-, t, ■) is (y{T^ x St)-measurable, where 

J-,-^ = a({X, :sG[0,t]}), 

t G [0, T], is the history of X up to time t, i.e., the minimal a-algebra on f2 making all the 
random variables {X^ : s G [0,t]} measurable. 

Throughout we shall assume the following conditions are satisfied. 



(I) For each r G IR+ the stochastic process y is the pathwise unique strong solution 
of equation ([6]) [32,33]. It is strong in the sense that, for each t G [0,T], is 
measurable w.r.t. the cr-algebra 

^f'^ = a({X„W^, :sG[0,t]}), 

which represents the joint history of X and W up to time t, i.e., the minimal cr- 
algebra on f2 making all the random variables {X^^Ws '■ s G [0,t]} measurable. 
Equivalently, the stochastic process is adapted to the filtration {Tf'^\(i\Q,T\- It 
is pathwise unique in the sense that if y and Y"^ are two strong solutions of ([6]), 
then Y^ = Y^' for all t G [0, T], P-almost surely, i.e., 

f(y; = Y;,te [o,T]) = 1. 

(See Remark 12.11 below for the existence and uniqueness of such a solution.) 
(II) The non-anticipative functionals F and G are such that 

T pT 

\F{t, f,g)\dt < oo and / G^{t,g)dt < oo, 

Jo 

for each / G and g G C^. 
(Ill) For each t G [0,T] and G Ct, 

|G(f, /) - G{t,g)f < K, f \f{s) - g[s)\UL{s) + \f(t) - g(t)\' , (10) 

^0 
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G^t, f) < K, f (1 + f{s)) dL{s) +K,{1 + fit)) , (11) 

and 

G\tJ)>K>0, (12) 

where L : [0, T] ^ M is a non-decreasing, right-continuous function satisfying L{t) e 
[0, 1] for each t G [0, T], and K, Ki and are finite constants. Equations ffTOl) . ffTTl) 
and f|T2l) correspond to Lipschitz, hnear growth and non-degeneracy conditions on 
the non-anticipative functional G, respectively. 

(IV) For each r G M+, 

where ^ = (i^t, ^t)tg[o,T] is the pathwise unique strong solution of the equation 

dit = G{t,it)dWt, eo = o. 

(Existence and uniqueness of ^ follow from condition (III) and [25, Theorem 4.6, 
p.128].) 

(V) For each r G R+, 

f ¥.[\F {t, X, Y')\] dt <oo (13) 

and ^ 

P (^j^ [F{t, X, Y'')\j^r] dt<(x?j = 1, 

where, for each r G R+ and t G [0, T], 

= ^i{Y::se[0,t]}), 

the history of up to time t. Here, and throughout, E[- | ■] denotes conditional 
expectation, as usual. 

Remark 2.1. If the random drift functional Bx in ^ satisfies appropriate similar Lip- 
schitz and linear growth conditions as to G in (HI), in a W -almost surely basis of course 
and with Ki and K2 and L random variables and stochastic process, respectively, then 
the existence of a pathwise unique strong solution of ^ can be read off from [25, Theo- 
rem 4-6, p. 128]. We do not explicitly require such conditions though, but just assume the 
corresponding existence and uniqueness in condition (I). 
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Remark 2.2. As the reader will easily verify, all the results in the paper hold if condition 
(I) is weakened to just ask that, for each r G M+, is any strong solution of i.e., 
to just assume the existence of each Y''' as any given strong solution of equation We 
demand uniqueness in condition (I) for sake of preciseness, as well as to properly interpret 
^ as an input- (unique) output dynamical system. 

As it will be detailed in subsequent sections, conditions (I) to (V), as well as the 
assumption on the stochastic independence of processes X and W , ensure the existence 
of several densities or Radon- Nikodym derivatives between the measures induced by the 
stochastic processes involved in their corresponding sample spaces of functions. These 
Radon-Nikodym derivatives are introduced in the following subsection. 

2.2 Absolutely Continuity of Stochastic Process Measures 

Recall from the previous subsection that the stochastic processes X = {Xt)te[o.T] and 
Y^ = {Yl)t,= io,T] (each r G M+) have trajectories, or sample paths, in the measurable spaces 
of functions {At,Bat) and {Ct,Bt), respectively. In the same way, the auxihary process 
^ = (6)tG[o,T]) introduced previously in condition (IV), has sample paths in the measurable 
space {Ct,Bt)- We denote by 

fXx , /J-y and /i^ 

the corresponding measures they induced in the measurable spaces {AtjBat), {Ct,Bt), 
and {Ct,Bt), respectively. Analogously, we denote by 

the (joint) measure induced by the pair of processes (X, y ) in the measurable space [At x 
CT,a{BA,xBT)). 

As it was mentioned by the end of the previous subsection, and as it will be detailed 
further in subsequent sections, conditions (I) to (V), as well as the assumption on the 
stochastic independence of processes X and W, ensure the absolutely continuity, in fact 
the mutual absolutely continuity, of several of the afore mentioned measures, and therefore 
the existence of the corresponding Radon-Nikodym derivatives. In particular, 

A*x,y ~ fJ'X X /^g and yUyr ~ fi^, (14) 

where, as usual, "~" denotes mutual absolutely continuity of the corresponding measures 
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and nx x /^g the product measure in {At x Ct, (j{Bat x Bt)) obtained from fix and /i^ in 
{At,Bat) and {Ct,Bt), respectively. From (HM it then follows that 



too. We denote the corresponding Radon-Nikodym derivatives by 

-(/,^), -1 — {g), and— ^ t(/'^)' 



(i [/ix X /x^] ' ' dfi^ ' d [fix X yUy 

/ G At, g G Ct. Note they are (j{Bat x -Bt)-, Bt-, and cr(i3Ar x S^) -measurable functionals, 
respectively. For product measures, such as for example nx x //yr, the differential d[iJix x 
/iyr] is sometimes written in the literature also as dfixd^Y^- 

In addition, for each t G [0,T], we denote by /iyr^t and n^^t the measures the restricted 
processes l^J^^^ = (y/)^g[o,t] and ^1^^,^^ = {Cs)se[o,t] induce on (Ct,St), respectively, by 

^{t,9), geQ, (15) 

the corresponding Radon-Nikodym derivative, and similarly for all the other measures and 
processes above. In accordance with our previous notation, we omit t in expressions of the 
form ffTSl) when t = T. 



Finally, we denote by 

and^(t,n 

the J-'t'^- and jF^^'^-measurable random variables, t G [0, T], obtained from the corresponding 
substitution of G Ct in f|T5|) by each sample path (Y^{u))s^[Q^t], G ^2, of the process 
Yil"^ , and similarly for all other processes and measures above. 



2.3 Input- Output Mutual Information 

Let M* = M U {±oo}, 0(fi,jF, P) be the space of all ]R*-valued random variables 6 on 
(f2,jF, P), and L^(fi,jF, P) be the space of all 6* G 6 having finite expectation, i.e., 

L\n,j^,¥) = {eee : e[|^|] < 00} , 

with E[-] denoting expectation w.r.t. P and the usual measure theoretic convention 0[±oo] = 
0. 
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We make the following definition involving the processes X = {Xt)te[o,T] and = 
(y{)te[o,T]: £ Here, and throughout, logarithms are understood to be, without loss 
of generality, to the natural base e, with the convention log[0] = — oo. 



Definition 2.1. If for each r G M+ the condition 



log 



d [jJ,x X /iyr 



iX,Y^') 



(16) 



is satisfiec^, we define the input-output mutual information, I : 



i, by 



I{r) = E 



log 



dfj, 



(17) 



In the same way, we define the instantaneous input-output mutual information. It : [0,T] x 



Iiit,r)=E 



log 



dfj, 



XY-^ 



-{t,X,Y^ 



Note that I{r) = Ii{T,r) for each r G IR+. Note also that we may alternatively write I{r) 
as 



log 



AtxCt 



dfix.. 



d [fix X A^y^ 



r(/,^) 



d[fix X /iyr] {f,g), 



r G M+, and similarly for Ii{t, r), (t, r) G [0, T] x 



Remark 2.3. For a given input process X , changing the value of r E M+ in ^ changes 
the output process Y^ , and thus changes the right hand side of ( (77| ) too. Therefore the 
notation I{r), treating r G IR+ as the variable for a given input process X . The notation 
Ii{t, r) obeys to the same reasoning. We find this notation more appealing than for example 
I{X, Y^) or Ii{t, X, y), specially in identifying the relevant variables to compute quantities 
such as 



— /(r) and T^Ii{t,r) 
ar otor 



in subsequent sections. 



Sufficient conditions for ( U61) to be satisfied will be discussed in subsequent sections. 



It is easy to check that / and /j are indeed non-negative- valued, i.e.. 



/ : M+ ^ R+ and li : [0, T] x M+ 



i°Note that, for each r e M+, the left hand side of ^ is J^^^'*'" = (T{{Xt,Y{ : t e [0, T]})-measurable, 
therefore J^-measurable too, and hence an element of 8(17, JF,P). 
^^Note condition also implies the well definiteness of U. 
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Definition 12.11 is motivated from tlie classical definition of mutual information in the 

context of stochastic processes and stochastic systems [3,34,35], such as the AWGNC. 



2.4 Minimum Mean-Square Errors 

A central role will be played in all the results to be stated in the paper by the measurable 
non-anticipative functional 

(f) : [0, T]x AtxCt-^ M, 

given by 

for each t G [0, T], / G At, and g G Ct- Note from condition (III), equation ( fT2l) . we have 
G{-, ■) 7^ 0, and therefore is well defined. 

Remark 2.4. From condition (V), equation fT^) . it follows that, for each r G 

E[|F(t,X,ni] <oo 

for Lebesgue almost- every t G [0,T]. Since also, from condition (III), equation fT^) . we 
have \G{-, ■)\ > \fK > 0, we conclude that, for each r G M+, 

E[|0(t,X,yOI] <oo, 

for Lebesgue almost-every t G [0,T] too. Therefore, for any Q sub-a-algebra of T and each 
r G M+ the conditional expectation 

E[cj){t,X,Y^')\g] 

is a well defined and finite Q -measurable random variable (in fact an element of L}{Vt, Q, 
with P|g denoting the restriction of F to Q [36]), for Lebesgue- almost every t G [0,T] as 
well. By defining it as a & on the remaining Lebesgue-null subset of [0,T], henceforth 
we treat it as a real-valued function in t & [0, T], for each r G M+. 



Having made the previous remark, we now introduce the following definition involving 
the above introduced functional (/), and the accompanying stochastic processes X, l^^'))te[o,r]5 
r G M+. 

Definition 2.2. For each r G M+ we define the causal minimum mean-square error 
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(CMMSE) in estimating the stochastic process y) at time t G [0,T] from the ob- 

servations , sG [0,t], denoted cmmse^{t,r) , by 



cmmse^(t,r) = E {(t>{t,X,Y'') - ^[4>{t,X,Y'')\Tf]f 



Similarly, for each r G M+ we define the non-causal minimum mean-square error (NCMMSE) 
in smoothing the stochastic process </)(■, at time t G [0,T] from the observations y/, 

sG [0,T], denoted ncmmsetp{t,r) , by 



ncmmse^(t, r) = E X, Y') - E X, F") )^ 



/n t/ie same way, and slightly abusing notation, for each r G t G [0,T], and s G [0,t] 
we sei 

ncmmse^it, s,r) = E X, F") - E X, F") 

i/ie NCMMSE in smoothing the stochastic process </){■, X, Y^) at time s G [0, t] from the ob- 
servations Y^ , u G [0, t], with t G [0, T] and the convention of omitting the first of its three 
arguments when it equals T, i.e., ncmmse<^(T, ■) = ncmmse(^(-, ■). Note that the quan- 
tities just defined differ through the conditioning a-algebras, and that ncmmse<^(t, t, r) = 
cmmse0(t, r) for each t G [0, T] and r G M^. 



Remark 2.5. From Remark \2.4\ it follows that, for any Q sub-a-algebra of T and each 
r G M+, 

(0(t,x,n-E[0(t,x,n|^])' 

is a well defined non-negative random variable for each t G [0,T], and therefore each of 
the three quantities introduced in Definition \2.S\ above is a well defined R_|_ U {oo}-valued 
function of its corresponding arguments, clearly jointly measurable. Note the domain of 
ncmmse0(-, ■, ■) is the set V C R-^ given by 



V={{t, 



s, r) G 



: t G [0,T],s G [0,t],r G M+} . 



3 Input-Output Mutual Information and CMMSE 

In this section we provide a result relating input-output mutual information, /, and CMMSE, 
cmmse^, for the general dynamical input-output system ([6]). The result generalizes the clas- 
sical Duncan's theorem for AWGNCs with or without feedback [2,24]. It also provides a 
general condition guaranteeing the fulfilment of requirement (fT6|l in Definition 12.11 
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Theorem 3.1. Assume that for each r G M+ we have 



I cmmse^ (t, r)(it < oo. 
Jo 



Then for each r G M+ we have 



log 



AX, y 



and the following relationship between I and cmmse^, 

r 

I{r) = - cmmse (j,{t,r)dt, 
2 Jo 

holds for each r G M+ as well. 

Before giving the proof of the theorem we make the following remark. 
Remark 3.1. Under a finite average power condition 



[ E [F\t, X,^)] dt <oc, rGM+, (19) 
Jo 



it follows that 

rT 



cmmse^(t, r)dt < oo, r G M+. 
Indeed, from ( flgj) and condition (III), equation ^S), we. have 







! E [4)'^ {t, X, V)] dt <oo, re 
Jo 



which implies, by standard properties of expectations and conditional expectations for fi- 
nite second order moment random variables [36], and with vjl = (p{t, X, V") and fjl = 
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E[(f,(t,X,Y') I J^f], reR+,te [0,T], that 



cmmse^{t,r)dt = / E [{'ql — '^[)^] 
Jo 

-Jo i.^^ [(^r)'] + \/E [{fiif]^ dt 

< j^^ (^2y/E [[r^lf]^ dt 

= 4 /" E [<f{t,X,Y^)\ dt 
Jo 

<oo, r e M+. (20) 

Relationship ( flgj) /iac? 6een previously proved in the especial case of AWGNCs (with or 
without feedback [2,24]), under condition fT^) . 



Proof. Let r G R+ be fixed throughout the proof. From conditions (I) to (V), the fact that 
the processes X and W are independent, and [25, Lemma 7.6, p. 292] and [25, Lemma 7.7, 
p. 293], we have that 

/Jx,y ~ ^J'X X /Jg and /iyr ~ /i^. 

Therefore 

/i^ yr ~ //X X /Zyr (21) 

too, and, by [25, Theorem 7.23, p. 289], 



d [fix X /iyr] ' d [jix X /i^] ' V dfi^ 

with the right hand side of the above expression equahng 

F{t,X,Yn~F{t,Yn,jTjr\ \ r {F{t,X,Yn-F{t,Yn) 



( f F{t,X,Yn~F{t,Yn ,jjyr\ I r f 



G^{t,Y' 



2 

-dt 



P-almost surely, where the non-anticipative functional F satisfies, for Lebesgue-almost 
every t E [0, T], 

F(t,n=E[F(t,X,rO|^r], (22) 

P-almost surely as well, and where = iW^ ,TY'')te[o,T] is a standard Brownian motion 
given by 



W = / — — —. (23) 
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Thus, we find that 



log 



mX,Y 



[ i;{t,X,YndW; [ i;\t,X,Yndt, (24) 
Jo ^ ^0 



where 

. F{t,X,y) - F{t,Y' 



G{t,Y^) 

Note that W '", even though it is obviously adapted to the filtration {J^t)te[o,T] i^t" )te[o,T]) 
it is a martingali^. and in fact a standard Brownian motion, w.r.t. the filtration (^/^'^)tG[o,Tb 
but not w.r.t. the filtration {Tt)t^[Q T] to which the integrand X, y))jg[o,T] is adapteco 
(unless in the trivial case when X is not random but a fixed deterministic trajectory). W ^ 
is in fact a semimartingale relative to the filtration (jFj^gjo^y], i-e., the sum of an (jF()fg[o,T]- 
local martingalil^ and an (jF()tg[o,T]-adapted finite variation procesilfl. Indeed, from (l23il 
and ([6]) we find 

,jjjr _ dY:-^F{s,Ynds 

"^^^ ~ G{s,Yn 

riu ^^^r,,. , dY:~V?F{s,X,Ynds 
= Vripit, X, Y )dt + G(Jy^) 

= ^f^^l){t,X,Y'')dt + dWt 

= dV[ + dMt, (25) 
with (jFi)ig[o,T]-local martingale (in fact martingale) component 

Mt= ! dWs = Wu [0,T], (26) 
Jo 

and, from conditions (III) and (V), equations ( |T2l) and ( |T3i) . respectively, with {J-'t)te[o,T]- 
adapted finite variation component process 

V; = V^ [ ij{s,X,Y')ds, te[0,T]. (27) 
Jo 



^^Recall a stochastic process (-^t)te[o,T] is a martingale w.r.t. the filtration (CJt)jgjQ if it is adapted to 
that filtration and, for each < s < t < T, E[\Zt\] < oo a,iidE[Zt \ Gs] ^ Z^, P- almost surely. 

^■^As it will be discussed in Section [HI W can be made into an (jFi)tg[o^T]-standard Brownian motion 
under an appropriate change of measure. 

^^Recall a stochastic process (^t)te[o,T] is a local martingale w.r.t. the filtration {Gt)te[o.T] if there 
exists an increasing sequence of stopping times {Tn}^^Q C [0,T] [4] such that each stopped process 
(^mm{t,T„})tG[o,T] is a martingale w.r.t. {gt)t(,[o,T]- 

^^Recall a stochastic process (^t)tG[o,T] is said to be of finite variation if, almost surely, all its paths or 
trajectories are finite variation functions on any subinterval of [0, T] [37]. 
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Therefore, from equations (IMl) to (1271) we conclude 



log 



^1 ^p{t,X,Y')dWt+'- I i>\t,X,y)dt. (28) 
Jo ^ Jo 



Now, note for each t G [0, T] we have 



il^{t,X,Y' 



F{t,X, y) - F{t,Y' 



G{t,Y^ 



<P{t,X,Y'-] 



F{t,Y'- 
G{t,Y'- 



and, since {G(t,Y^))t^[Q^T] is obviously adapted to the history {J-'^' )te[o,T], from fl22l) we 
have, for Lebesgue almost-every t G [0,T], 

F(t,n E[F(t,x,y)i^r] 



- F{t,X,Yn ^ 

E[0(t,x,ni-^r] 



E 



P-almost surely. Thus, for Lebesgue almost-every t G [0, T] as well we have 

ij{t, X, Yn = 4>{t, X, Yn - E [0(t, X, i^r] , 

P-almost surely, hence, by Fubini's theorem [37], and since -0^ > 0, 



(29) 



E 



ij\t,X, Y')dt 



T 



E [^^(t,X, F")] 

/ cmmse<^(t, r)(it < oo, 
Jo 



(30) 



and therefore, since also W = (Wt, J-'t)t£io,T] is a standard Brownian motion and {ip{t, X, Y^))ti^ 
is adapted to the same filtration {J^t)tG[o,T] w.r.t. which is a martingale, we conclude 
that 



/ ij{s,X,YndWs,J't 
Jo 



te[o,T] 



is a centered martingale [38], and then, in particular, that 



E 



tP{t,X, Y'-)dWt 



(31) 
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Thus, from (EHl), (ED]) and (Ell) we conclude that 



log 



-{X, Y' 



and that 



/(r)=E[v^/ ij{t,X,Y')dWt + l f 
Jo ^ Jo 



ij\t,X, Y'')dt 



-E 



T 



ij\t,X, Y'')dt 



E dt. 



Equation (|T8l) then follows from the previous expression in light of fl29|) . proving the theo- 
rem. □ 

Remark 3.2. The assumption in Theorem \3.1\ implies the Lehesgue almost- everywhere 
finiteness of cmmse^{-,r) on [0,T] for each r G M+. 

Remark 3.3. Under the assumption that 



[ E [(j)\t, X,Y'')] dt <oo, re 
Jo 



it is also possible to give a proof of Theorem \3.1\ by reducing system ^ to an AWGNC with 
feedback, which can be accomplished by using existence and uniqueness theorems for solu- 
tions of SDEs with general driving semimartingales and constructing appropriate implicitly 
defined measurable non-anticipative functional, and then applying the known results for 
that case [24]- However, the proof given here, in addition to require a weaker assumption 
(see ^2Di) in Remark \3. shows how explicit computations can be handled for the general 
case, which will be of use in subsequent sections. 



As mentioned before, Theorem I3.H which relates input-output mutual information (/) 
and CMMSE (cmmse^) for the general dynamical input-output system ([6]), generalizes the 
classical Duncan's theorem for AWGNCs with or without feedback [2,24]. Indeed, for the 
AWGNC with feedback we have G = 1, therefore (p = F, and hence equation (fTS!) in 
Theorem 13.11 reduces to 



I(r) 



E 



(F(t,x,n-E[F(t,x,ni-^r])' 



dt, 



which in turn reduces for the AWGNC without feedback, where in addition F{t,X,Y^ 
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Xt for each t G [0,T] (see equation (JTj)), to 



Hr) 



with r G Mj_ the channel SNR and 



E 



E 



{Xt-¥.[Xt\j'r]f dt, 



{x,-E[x,\:Fr]y 



the CMMSE in estimating X at time t G [0,T], X^, from the observations Y^, s G [0,t]. 
Note in the general case 

F{-,X,Yn 



X, y 



plays the role of X. (or F(-,X, y)) above. 



G(-,y^ 



4 On an appropriate Notion of SNR 

In this section we discuss on conditions under which the parameter r G M+ in ([6]) can 
be properly interpreted as an SNR parameter for such a general input-output system, in 
analogy with the AWGNC case [3] described by ([7]). These conditions will allow us to 
establish in the next section a useful and important relationship between input-output 
mutual information, /(■), and NCMMSE, ncmmse(^(-, ■) = ncmmse^(T, ■), for the general 
dynamical input-output system ([6]), generalizing a known relationship holding for AWGNCs 
[3]. 

Consider the AWGNC without feedback, described by equation ([71), i.e., 

dv; = y/¥Xtdt + dWt, t G (0, T], 

with X and the channel input and output, respectively, for a given fixed value of the 
parameter r G IR+. Here F{-, X,Y^) = X. and G = 1. Then, the ratio between the 
instantaneous "signal component" power, 

{V^F{.,X,YnY = rX^ 

and the instantaneous "noisy component" power. 



G'{;Y^)^1, 
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(mii^] =rA-. (32) 



is given by 

i.e., it is proportional to r for a given fixed input power leveH Therefore the interpretation 
of r as an SNR channel parameter. 

The interpretation of r as an SNR channel parameter is not as straightforward as above 
for the standard AWGNC with feedback, described by the equation 

dY; = VrF{t, X, Y')dt + dWt, t e (0, T]. 

Here, though G = 1, we have 



V G{-,Yr) 



rF^{-,X, Y' 



and therefore r cannot be properly interpreted as an SNR channel parameter since, for 
instance, it may very well happen that an increment in r changes the corresponding output 
process Y^ in such a way that, say, rF'^{-,X,Y^) becomes even smaller. 

It should be noted that treating F{-, X, Y^') as a "net channel input" (instead of X) does 
not solve the above difficulty since, except in trivial cases, it is not possible to maintain 
then a fixed reference input power level -F^(-, X, Y^) while varying r G 

Motivated from the above discussion, and interpreting the general input-output dy- 
namical system ([6]) from a classical communication systems point of view, as described 
in Section [21 we now make the following definitions identifying general classes of systems, 
belonging to the setting given by (Q, where a notion of SNR can be properly introduced. 

Definition 4.1. We say the dynamical input-output system ^ is a quasi- SNR- system 
if for any input process X as in Section and corresponding family of associated output 
processes y , r G M+, the family of stochastic processes 



is F-almost surely non- decreasing, in the sense that for each ri,r2 G M+ with ri < r2, 

r,cP\-,X,Y'^)<r2(l)\;X,Y^') 



i6r 



'Of course, and strictly speaking, what should be kept fixed in the random inputs case is the average 
input power, E[X^]dt, with the corresponding interpretation of ([5^ also in terms of average quantities. 
However, that does not alter the present discussion. 
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f -almost surely, i.e., f -almost surely as well, 
r,<p\t,X,Y'^) = r, 



2{+ V vr^^ „ F'^{t,X,Y' 



ri 



G^{t,Y 



for all t e [0,T]. 



Definition 4.2. We say the dynamical input-output system ^ is an SNR-system (with 
SRN parameter r G IR+ ) if there exists a measurable non- anticipative functional 9 : [0, T] x 
At — > IR+ such that 

<P\t,f,g)=9{tJ) 

for all t G [0,T], / G At, and g G Ct- Note then, for any r G IR+ and X and related 

by 

^ ,,,, rF\t,X,Y^-^ 



rct^\t, X, Y^) = ^ = reit, X) 



for allte [0,T]. 



Definition 4.3. We say the dynamical input-output system ^ is an strong-SNR-system 
(with SRN parameter r G M+y) if there exists a measurable non- anticipative functional 
r] : [0,T] X At -^R such that 

<P{t,f,g) = r^{t,f) 

for all t G [0,T], / G At, and g G Ct. Note then, for any r G M+ and X and Y^ related 

by m, 



for all t G [0,T]. 



We straightforwardly have that an strong-SNR-system is an SRN-system, and that an 
SNR-system is a quasi-SNR-system. Also, an SRN-system where the functionals F and G 
have the same sign, i.e., where 

F{tJ,g)G{t,g)>Q 

for all t G [0, T], / G At, and g G C^, is clearly an strong-SNR-system. Indeed, since then 
> 0, we can take for vj in Definition 14.31 

with 6 satisfying Definition 14. 2[ 
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Note when system ([6]) is an strong-SNR-system, say with measurable non-anticipative 
functional t] : [0,T] x — M in Definition 14. 3[ it may be written as 



dv; = ^r]{t, X)G{t, Y'')dt + G{t, Y'')dWt, 

i.e., as 

dV; = G{t, [V^vii, X)dt + dWt] , 

and therefore interpreted as a cascade of two systems: An AWGNC followed by a semi- 
martingale SDE system, the output of the first acting as the semimartingale integrator in 
the second, i.e., 

Y;= [ G{s,Y:)dZl [0,r], (33) 

with 

dZl = Vrvis, X)dt + dWs. (34) 

Alternatively, G'(-, ■) can be looked at as a functional feedback modulator factor, modulating 
the AWGNC differential output dZ"^ . Note however that from (l34l) we recognize (Z[)tg[o,T] 
as an unbounded variation semimartingale, and therefore the integral in (|33ll corresponds 
to a semimartingale stochastic integral and not to an standard pathwise Lebesgue-Stieltjes 
integral. 

As it will be discussed in the next section, a quasi-SNR-system is not enough to 
have the relationship between input-output mutual information, /(■), and NCMMSE, 
ncmmse(^(-, ■) = ncmmse(^(T, -, ■), proved therein. However, for sake of completeness, we 
provide in the following lemma and its corollary sufficient conditions for system ([6]) to be 
a quasi-SNR-system. Conditions for system ([6]) to be an SNR-system or an strong-SNR- 
system are explicit in the corresponding definitions, since they only involve the structure 
of the functional 0. 

Lemma 4.1. Assume the measurable non-anticipative functionals F and G in ^ are such 
that 

F{t, f, g) = F{t, /, g{t)) and G{t, g) = G{t, g{t)) (35) 

for all t G [0,T], / G At, and g G Ct, where F and G are measurable mappings from 
[0, T] X At X M and [0, T] x R into M, respectively. Let X be any input process as in Section 
and assume that F{t, X, ■) satisfies the following Lipschitz condition, in a F-almost surely 
basis, 

\F{t, X, y,) - F{t, X, 2/2) I' < Kx \yi - y2? (36) 

for each t G [0,T] and all yi,y2 G M, where Kx is a bounded random variable. Then, 
for each < ri < r2 < 00, the corresponding output processes Y^^ = (Y['^)t(z[o,T] O'^d 
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yr2 _ (Yp)^^^Q^rp-^ defined by ^ are such that 

f{y;^ <Yr,te[o,T]) = i. 

Proof. First note that, since Kx in (136!) is bounded, we may assume, without loss of 
generahty, that it is a finite constant (for a given X). Then, by using Ito's formula [17] 
and proceeding by similar arguments as in the proof of [17, Proposition 2.18, p. 293], we 
find that, for each t G [0, T], 

E [A+] <Kx [ E [At] ds, 
Jo 

where, for each t G [0,T] as well, 

A+ = max {At, 0} 

and 

Thus, from Gronwall's inequality [17] we conclude that 

E [A+] = 

for each t E [0, T], and therefore 

F{Yr<Yn = i, 

for each t G [0, T] too. The result now follows from the sample path continuity of the 
system outputs [17]. □ 

Remark 4.1. Though as stated in SectionlE conditions (I) to (V) are assumed to hold 



throughout, the reader can verify that Lemma holds indeed the same under just, in 
addition to [3^) of course, condition (I) and condition (III), equation / flOj) . which now 
takes the form 

\G{t,yi) -G{t,y2)\^ < D\y^- y2\^ 
for each t E [0, T] and all yi,y2 E M, with D a finite constant. 



Corollary 4.1. Assume the same hypotheses as in Lemma \4.1\ and that the functional cj), 

G 



taking now the form 4> = ^ with F and G defined by ^3^, is such that for each t G [0,T], 
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/ e At, and gi,g2 e Ct with gi{s) < g2{s) for all s E [0,T], 

G((,9i(t)) - G(t,j2(()) 
Then the dynamical input-output system ^ is a quasi- SNR- system. 



Proof. Let X be any input process as in Section [2], < ri < r2 < oo, and = (l^^'^)ig[o,T] 
and = (y^'^)tg[o,T] be the corresponding output processes. Then, from Lemma WA\ we 
have 

p(r/^<F/^^e[o,T]) = l, 

and therefore, P-almost surely, 

F\t,X,Yt 



ri0^(t,X,y^) = rr 



G\tXK 



for all t G [0, T], proving the corollary. □ 



5 Input-Output Mutual Information and NCMMSE 

In this section we establish an also useful and interesting relationship, relating now input- 
output mutual information, /(■), and NCMMSE, ncmmse0(-, ■) = ncmmse<^(T, ■), for the 
general dynamical input-output system provided a sufficiently strong proper notion of 
SNR is taken into account, namely system being an strong-SNR-system. Recall from the 
previous section that an SNR-system is also an strong-SNR-system when the functionals 
F and G have the same sign, i.e., when 

F{tJ,g)G{t,g)>0 

for all te[0,T], f e At, and g G Ct- 

Consider once again the AWGNC, where F{t, f, g) = f{t) for all t e [0, T], / G At, and 



27 



g G Ct, and where G = 1, i.e., described by the equation 

dY; = ^Xtdt + dWt, t G (0, T], 

relating the channel input X and the channel output y for each value of the parameter 
r G M_|_. Then, provide 

pT 

E [X^] dt <oo, (37) 







we have that [3] / : R4. M_|_ is differentiable in ]R_|_ (from the right at the origin) and that 
the relationship 

— /(r) = - / ncmmse J t,r)dt (38) 
dr 2 Jq 

holds for each r G IR+ (here 0(t, /, (7) = f{t)). 

However, as pointed out in Guo et al. [3], relationship (1551) does not hold true in the 
AWGNC with feedback, described by the equation 

dY; = VrF{t, X, Y'')dt + dWt, t G (0, T], (39) 

even if, in the terminology introduced in the previous section, system (|39|) is a quasi-SNR- 
system. 

The following result establishes that relationship ( l38ll does indeed hold for system ([6]), 
provided it is an strong-SNR-system. 

Theorem 5.1. Assume that system ^ is an strong-SNR-system and that the stochastic 
process (0(t, X, y ))jg[o,T] has, for each r G IR+, finite average power, i.e., 

[ E [(P\t, X,Y'')] dt < 00. (40) 
Jo 

Then I : R+ IR+ is differentiable in M+ (from the right at the origin ) and the following 
relationship between /(■) anc? ncmmse(^(-, ■), 



d If 

-—I{r) = - / ncmmse <^ ( t, r) (it, (41) 
dr 2 Jq 

holds for each r G IR+ . 



Before giving the proof of the theorem we make the following remark. 



^^As it can be read off from the proof of [3, Lemma 5], the finite average power condition ([57]) is now 
required. 
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Remark 5.1. Since in Theorem I5.il system ^ is required to he an strong-SNR-system, 
say with measurable non-anticipative functional f] : [0,r] x M m Definition \4.!^ we 

have 

dY[ = y^r]{t, X)G{t, Y'')dt + G{t, Y'')dWt, 
and therefore condition and relationship (f^Tl j take the form 

E [rfit.X)] dt <oo 

JO 

and 



^/(r) = 11^ [iv{t,X)-E[ri{t,X)\:Ff]) 

respectively. 



dt, 



Proof. As in Remark 15. II above, for each r G IR+ we may write 

dV; = ^r]{t, X)G{t, Y^)dt + Y'')dWt, 

and therefore, since from condition (III), equation (fT2l) . we have that G 7^ 0, we may as 
well write 

dY'' 

* ^r]{t,X)dt + dWt. 



Define, for each r G IR+, the process Z"^ = {Zl)te[o,T] by 

dY^ 

"^^^ - G{t,Yr)- 

i.e., by 

Note process Z'^ has trajectories, the same as Y^, in the measurable space {Ct,Bt). We 
may look at Z^ as being the output of the system 

dZl = y/^r]{t, X)dt + dWu (43) 



corresponding to the input X and parameter r. System (H3ll is nothing but an AWGNC. 
Now, since 

f E[ri\t,X)]dt= [ E [(j)'^{t, X,Y')] dt < 00, 
Jo Jo 
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from Theorem 13.11 applied to system ( H3l) (see Remark I3.ip we obtain 



log 



iX,Z' 



for each r G M+, and 



E 



log 



d [fix X /i^r 



2 Jo 



E 



{vit,X)-E[rfit,X)\j-r]T 



dt. 



for each r G M+ as well. Moreover [3], the previous expression is differentiable in r G M+ 
(from the right at the origin) and 



d_ 
dr 



E 



log 



dfi 



d[fix X fizr 



E 



{i^{t,X)-E[r]{t,X)\j^f]f dt (44) 



holds for each r G M+, with JF^"^ = JF^Cy and 



J^f =a{{Z::se[0,t]}), 

the history of Z"^ up to time t G [0, T]. But, from the definition of process Z"^ in it is 
clear that 



for each t G [0, T]. In addition, we may also rewrite fj42l) as 



and regard the previous expression as an SDE being satisfied by y = (Y'/)tg[o,T], with 
"driving" semimartingale Z^ = (Z[)tg[o,T] given by ( l43l) . Then, by the existence and 
uniqueness theorem [4, Theorem 7, p. 253], and from the Lipschitz continuity requirement 
in condition (III), equation ffTOl) . we conclude that 



for each t G [0, T], and therefore 



t 5 



for each t G [0,T] as well. Thus, from [25, Lemma 4.9, p. 114] we conclude the existence, 
for each r G IR+, of measurable non-anticipative functionals and 6^, from [0, T] x Ct into 
M, such that 

Z[(a;) = a^(t,r"(cu)) and ^/(u;) = 6^(t, Z."(cu)) 
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for (A X P)-almost every [t, u) G [0, T] x Q, with A denoting Lebesgue measure in [0, T] and 
A X P the product measure of A and P. Hence, we may replace in fl44p all occurrences of 
Z'" by to obtain 



dr 



log 



d [/ijSf X /iyr 



-(X, Y' 



E 



{vit,X)-E[v{t,X)\j^f]Y 



dt. 



giving us (HTi) (see Remark 15.11) and thus proving the theorem. 



□ 



We have the following corollary to Theorem 15. H generalizing the corresponding result 
for AWGNCs [3]. 



Corollary 5.1. Under the same assumptions as in Theorem \5.1\ for each r G (0, oo) we 
have 



1 



with 



and 



cmmse<^(r) = - ncmmse^{u)du, 



1 

cmmse (/,(■) = — / cmmse ^{t, ■)dt 
T Jo 



1 r 

ncmmse(^(-) = 7^ ncmmse0(t, ■)dt 
T Jo 

the time-averaged CMMSE and NCMMSE over [0,T], respectively. 



Proof. The result follows directly from Remark 13.11 and Theorems 13.11 and 15.11 



□ 



6 Dynamical Relationships 

It is apparent from the previous sections that the results already provided have time- 
instantaneous counterparts, and in particular that we have consistency in that 

d r 

and 



5 w N 1 r 

—Ii{t,r) = - ncmmse ^{t,s,r)ds. 
or 2 Jq 



Remark 16.11 and Theorem 16.11 below show that is indeed true. This fact brings dynamical 
relationships into the picture allowing to write general integro-partial differential equations. 
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also given in this section, characterizing instantaneous input-output mutual information 
and MMSEs. 



Remark 6.1. Consider the condition 



P 



(^j^ ij'^{t, X,Y'')dt <oo^ =1, re 



with 



^{s, X, Y^) = <P{s, X, Y^) - E [<p{s, X, Y^)\j^r] 



which is implied by conditions (IV) and (V), and define the process = {M[ , J^t)telo,T] 
by 



Ml =exp ^-Vr 'i/j{s,X,Y'')dWs 



X exp 



ip\s,X, Y'')ds 



Note from the proof of Theorem \3.1\ we have 

d [nx X I^Y- 



dfi 



dfix, 



■(X,n = (M^)-' 



d [fix X /iyr- 

F-almost surely. Also note that (M/*, jFt)jg[o,T] 'is a (strictly positive) supermartingal^ [25], 
and, since furthermore 

E [M^] = E [ll^f^^iiifld (X, y'^) = 1 = E [Ml] , 



djjx, 



we have that {Ml J^t)te[o,T] is in fact a martingale [25]. Hence, for each t G [0,T] we have 

E [Mlr\j^s] = Ml, F-almost surely, E [MJ] = 1, 
and therefore the consistency property [25] 



dfi 



^^Recall a stochastic process {Zt)t^[o^T] is a supertmartingale w.r.t. the filtration (t/t)tg[Q if it is 
adapted to that filtration and, for each < s < t < T , K[\Zt\] < oo and E[Zt \ Gs] < Zg, P-almost surely. 
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F-almost surely as well, t G [0,T]. Equivalently , in terms of {{Ml) ^ , J^t)te[o,T], with 
¥) the probability measure on {Q,J-'t) given by 

Q'^(A) = / M^dF, A e Tt, 

J A 

and with EQr[-|-] (resp., EQr[-]^ denoting conditional expectation (resp., expectation) on 
{Vt,J^T,'Q"), for each t e [0,T] we have [17] 

Eq. [{M^y \J^,] = {Mir'^[{M[)-'Ml'\j^,] 

= {Mir' , 

Q^-almost surely, therefore we have that {{MD^^ , J^t)teio,T] is a martingale on {Q,J^t,Q.^), 
with 

E^r [(M[)-^] =Eq,. [(M^)-^] 
r dF 

and, as before, for each t E [0,T] we thus have the consistency property [25] 

^^"•"^ -{t,X,Yn = {Ml)-\ 



d [fix X yUyr- 



-almost surely, hence F-almost surely too since in particular F is absolutely continuous 
w.r.t. Q'' . Alternatively, and in connection with the proof of Theorem \3.1\ note since is 
an {J^t)f.e^[n T] - martingale we have with W"" as defined there, and from Girsanov's theorem 
[4,17], thaB {Wt,J't) te[o,T] is a standard Brownian motion on {Q,J^t,Q''), and therefore, 
by the same arguments as before, the process 

{Ml)-' =expS^V^ jy{s,X,Y^)dW:^ 

xexp^^~jy\s,X,Y^)dsY fG[0,T], (45) 

is an {J-'t)t(z[Q,T]-niartingale on {Q, J^t^Q."^) ■ Indeed, follows from the proof of Theorem 
\3.1[ and, since 



'ilj'^{t,X,Y')dt < oo^ 



1, 



the right hand side of ( fT^P is an {J^t)tG[o,T]-supermartingale on {Q, J-'t,Q^) having constant 



^^Note that, rather than considering the tuple {W^ ,Tr)t<£[o,T] on the space (r2,JF, P) as in the proof 
of Theorem 13.11 we now consider the tuple {w[] ,J-'t)telo.T] on the space {fl, J-t,Q^)- 
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expectation, hence a martingale [25]. 



Having stated the previous remark, we now give the main results of this section. 
Theorem 6.1. Assume that for each r G R+ we have 



r 

i cmmse^(t, r)dt < cxD. 

^0 



Then, for each r G M+ as well, Ii{-,r) : [0,T] IR+ is Lebesgue-almost everywhere differ- 
entiable in [0,T] and, at each point t G [0,T] where this is so, we have 

d r 

—Ii{t,r) = -cmmse0(t,r). 
Moreover, if in addition system ^ is an strong- SNR- system and for each r G M+ we have 

[ E [c^^t, X,^)] dt < oo, (46) 
Jo 

then, for each t G [0,T], Ii(t, ■) : M+ M+ is differentiable in M+ (from the right at the 
origin) with derivative given by 



d 1 r 

— /i(t,r) = -y ncmmse</,(t, s,r)(is (47) 

for each r G M+. 



Before giving the proof of the theorem we make the following remark. 

Remark 6.2. As in remark I5.il in the previous section, note that condition and 
relationship take the form 

E [ri'^{t,X)] dt< oo 



ds, 



and 

a _ . 1 '•^ 



respectively, with system ^ an strong- SNR- system satisfying Definition \4.3\ with measur- 
able non- anticipative functional rj. 
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Proof. Let r G M+. From Remark 16.11 we have, for each t E [0, T], 

^t^^^X^{X,Y^,t)=exp\v^ fiP{s,X,Yndw\ xexp(^ !\\s,X,Y^)ds 
d [^ix X ^y.J I Jo J 1 2 io 

P-almost surely, from where, and proceeding by the same arguments as in the proof of 
Theorem 13.11 

/i(f,r) = ^ / cmmse<^(s, r)ds. 

The first part of the theorem then follows. The second part of the theorem also follows 
from the previous relationship by applying it to an AWGNC as in the the proof of Theorem 
15.11 when system ([H]) is an strong-SNR-system, and proceeding by the same arguments 
considered therein. The theorem is then proved. □ 



We have the following two corollaries to Theorem 16. 1[ 

Corollary 6.1. Assume that system ^ is an strong-SNR-system and that for each r G 
we have 

T 

E dt < oo. 







Then, for each r G (0, oo) and t G (0, T] we have 

1 r 

cmmseJt,r) = - / ncmmse J t,u)du, (4J 
r Jo 



with 

1 /•* 



1 r 

cmmse<^(t, ') = - I cmmse0(s, ■)ds 
^ Jo 



and 



1 r 

ncmmse(^(if:, ") = ~ / ncmmse<^(t, s, ■)ds 
i Jo 

the time-averaged CMMSE and NCMMSE over [0,t], respectively. 

Proof. The result follows directly from Remark 13.11 and Theorem 16.11 □ 

For the next corollary, denote as usual by C^{A) the space of functions h : A -^M. with 
continuous fc-th order partial derivatives in A C M". Partial derivatives at a boundary 
point are understood to be taken from the right or from the left, accordingly. In the same 
way, denote by C^{A) the space of functions /i : A — > M continuous in A C M", with an 
analogous convention than before at boundary points. 
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Corollary 6.2. Assume the same hypotheses as in Corollary \6.1\ Assume furthermore 
that cmmse^(-, ■) G C"'^([0,r] x R_,_) and that ncmmse<^(-, ■, ■) is differentiahle w.r.t. its first 
and third arguments, t G [0,T] and r G IR+ respectively, with 

ncmmse(A(-, •, ■) and ncmmseai(-, ■, ■) 
both belonging t^C°{V). Then, 

/,(-,■) gC2([0,T]xM+) 

with second- order partial derivatives given, for each (t, r) G [0,T] x by 

d"^ d 
2—Ii{t, r) = r— cmmse0(t, r), 

2—Ii{t,r)= —ncmmse^{t,s,r)ds, 

/■* d 

2-^^Ii{t,r) = J —ncmmse^{t,s,r)ds + cmmse^{t,r), 



and 



d 

2-^-^Ii{t, r) = r— cmmse<^(t, r) + cmmse0(t, r). 



In particular, 



r— cmmse^{t,rj = / — ncmmse ^[t, s,rj as, 
or Jq at 

for each (t, r) G [0,T] x M_|_ as well. 



(49) 



Before giving the proof of the corollary we make the following remarks. 

Remark 6.3. It is easy to see that, under the assumptions of Corollary \6.2\ equation ( [7^ 
can also be obtained from [J8\ ) by multiplying both sides of (JB) by rt and then taking 



the derivative to the resulting equation (using Leibniz's rule as before), and therefore 
relationship ( f^gp corresponds to an integrated version of (J^- 

Remark 6.4. The smoothness requirements on cmmse,^ and ncmmse^ in Corollary \6.^ can 
be guaranteed under appropriate corresponding smoothness requirements on the coefficients 
F and G, for certain input-trajectory spaces At and structures of F and G [25]. 



20 



Recall V = {{t,s,r) e : t e [0,T],s e [0,t],r e M+}. 
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Proof. That G C'^{[0,T] x R+) with the corresponding given expressions for the 

second order partial derivatives follows directly from the assumptions, Remark |3. II and the 
expressions for the first order partial derivatives in Theorem 16.11 and the use of Leibniz's 
rule for the differentiation of integrals [39] along with the fact that ncmmse<^(t, t, r) = 
cmmse ^{t,r) for each t G [0,T] and r G M_|_. The last claim of the corollary, equation 
follows from the fact that ^ C^([0,T] x M+), since then we have 



dtdr drdt 

for each (t,r) G [0,T] X M+. □ 



7 Further Extensions and Results 

As mentioned in Section [H it is possible to give n(> 1) -dimensional counterparts of all the 
results established in the paper, where system ([6]) takes the form 

Y; = y^ [ F{s,X,Y')ds+ [ G{s,Y'')dWs, t>0, (50) 
Jo Jo 

with r G M+, X = {X'^)^^^ and = {Y^'^)^^^ the R"-valueclEl input and output processes, 
respectively, W = (W^*)"^^ an ]R"-valued standard Brownian motion independent of X, 
F{-, ■) = an ra-dimensional vector of M-valued measurable non-anticipative 

functionals, and G{-,-) = (G'*'-'(-, an n x n matrix of R-valued measurable non- 

anticipative functionals as well. The corresponding Radon-Nikodym derivatives, and con- 
sequently the input-output mutual information, are then defined in terms of the measures 
the different processes involved induce in the corresponding multi-dimensional space, as 
for example {C^,B^), the space of M"-valued continuous function in [0,T] equipped with 
the corresponding a-algebra of cylinder sets, similarly to the considered case n = 1. The 
required changes in the statement of the corresponding n-dimensional results are straight- 
forward, with the functional (p taking now the form 

<l){t,f,g) = {MtJ,9))7=i = [G{t,g)]-'F{tJ,g), (51) 

t G [0, T], / G A^, g G C^, and with condition (III), equation (|T2l) . now interpreted as the 
requirement of 

^^AU vectors in R" or vector-valued processes should be envisioned as column vectors. 
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with [■]* denoting the transpose of the corresponding matrix (or vector), being uniformly 
eUiptic [40], i.e., such that there exists S G (0, oo) with 

n 

Y,H^At,gHjj>5\\jf 

for all t E [0,T], g E C^, and 7 = (7i)"=i G M", where || ■ || denotes the usual Euchdian 
norm in M", i.e., \\y\\ = y*y for each y G M". Note the uniform ellipticity of H in particular 
implies the invertibility of G. All other requirements in Section [21 Subsection 12.11 on 
the functionals F and G, or on processes such as X, y))^^^^^]) are understood to 
hold in the n- dimensional setting in a componentwise (or elementwise, in case of matrices) 
fashion. Equivalently, they can be written in terms of the Euclidian norm || • ||, the 1-norm 
\\y\\i = \yi\, y = {1/1)7=1 ^ J^"") the Frobenious matrix norm || • \\p, given by 

n 

II^IIf = ^ ^ij' ^ ^ (A,jr)"j = l, 

accordingly. Similarly, conditions such as 

[ E[(f)^{t,X,Y'')]dt <oo 
Jo 

are also interpreted as to holding in a componentwise fashion or, equivalently, in terms of 
the Euclidian norm || • ||, 

[mt,x,Ynf] dt = j2 [<put,x,Yn] dt 

Jo Jo 

< 00. 

In the same way, MMSEs are written in terms of the Euclidian norm, like for instance 
cmmse<^(t, r) = E [||(^(t, X, Y') - E [(f){t, X, F") | J^f ] iT 

n 

= j]e [(0,(t,x,n -e [</>.(t,x,ni-^r])'] , 
1=1 

t E [0,T], r E M+. The analogous definitions for system fl50l) to be a quasi-SNR-system, 
an SNR-system, or an strong-SNR-System are also straightforward from Section |U with (p 
given by fl5T]) and the obvious replacement of (j)"^ by ||0||^. 

It is also possible to consider system (1501) in the case when r = (rj)[Li G M", with y/r 
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in (l50ll replaced by the diagonal matrix 



diag(v^, . . . , v^), 

and to give relationships involving not only time derivatives of the input-output mutual 
information, but also, the same as for the AWGNC case, derivatives w.r.t. each component 
Ti of r. We do not give the details since, in light of the results already stated in the paper, 
this extension follows by the same line of arguments as in the AWGNC case [3] . 

Finally, and again in light of the results already stated in the paper, it is also possible 
to study the asymptotics of input-output mutual information and MMSEs, writing anal- 
ogous expressions as in the AWGNC case for high and low values of r G M+, and to find 
representations of other information measures such as entropy and divergence in terms of 
pure estimation-theoretic quantities, also analogous to the AWGNC case [3]. The details 
are left to the reader. 



8 Conclusion 



In this paper we have considered a general stochastic input-output dynamical system, 
covering a wide range of stochastic system models appearing in engineering applications. 
In such general setting, we have established important relationships linking information and 
estimation theoretic quantities. In particular, precise equations revealing the connection 
between input-output mutual information and minimum mean causal and non-causal square 
errors were found for this setting, corresponding to analogous of previously known results 
in the context of additive Gaussian noise communication channels. Furthermore, they 
were stated here in this broader setting not only in terms of time-averaged quantities, but 
also their time-instantaneous, dynamical counterparts were presented. In extending those 
relationships we have also identified conditions for a signal-to-noise ratio parameter to be 
meaningful, and characterized in those terms different system model classes. 

We believe the results presented in the paper will find interesting future applications in 
several engineering fields, as they evidence that the deep connection between information 
theory and estimation theory goes beyond communication systems, encompassing indeed 
a whole range of dynamical systems of great use and interest in the stochastic modelling 
community. 
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